An Efficient Data Indexing Approach on Hadoop Using Java Persistence API
نویسندگان
چکیده
Data indexing is common in data mining when working with high-dimensional, large-scale data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has become of significant interest in distributed data mining. To resolve problems of globalization, random-write and duration in Hadoop, a data indexing approach on Hadoop using the Java Persistence API (JPA) is elaborated in the implementation of a KD-tree algorithm on Hadoop. An improved intersection algorithm for distributed data indexing on Hadoop is proposed, it performs O(M+logN), and is suitable for occasions of multiple intersections. We compare the data indexing algorithm on open dataset and synthetic dataset in a modest cloud environment. The results show the algorithms are feasible in large-scale data mining.
منابع مشابه
Engineering Projects and Support (Pre and Final Semester)
Open Source Science Clouds Cloud Computing hosting Architecture Architecture of Web-EDA system based on Cloud computing and application for project management of IC design An Efficient Data Mining Framework on Hadoop using Java Persistence API Volunteer Computing and Desktop Cloud: The Cloud@Home Paradigm Model inter comparison study: cloud-radiative forcing and feedback's A Taxonomy and Survey...
متن کاملEngineering Projects and Support (Pre and Final Semester)
Open Source Science Clouds Cloud Computing hosting Architecture Architecture of Web-EDA system based on Cloud computing and application for project management of IC design An Efficient Data Mining Framework on Hadoop using Java Persistence API Volunteer Computing and Desktop Cloud: The Cloud@Home Paradigm Model inter comparison study: cloud-radiative forcing and feedback's A Taxonomy and Survey...
متن کاملTRANSACTIONS ON BIG DATA 1 A Distributed
Java 8 has introduced new capabilities such as lambda expressions and streams which simplify data-parallel computing. However, as a base language for Big Data systems, it still lacks a number of important capabilities such as processing very large datasets and distributing the computation over multiple machines. This paper gives an overview of the Java 8 Streams API and proposes extensions to a...
متن کاملTransactions on Big Data
Java 8 has introduced new capabilities such as lambda expressions and streams which simplify data-parallel computing. However, as a base language for Big Data systems, it still lacks a number of important capabilities such as processing very large datasets and distributing the computation over multiple machines. This paper gives an overview of the Java 8 Streams API and proposes extensions to a...
متن کاملJPA Criteria Queries over RDF Data
We present the design and implementation of a prototype system for querying RDF data via the Java Persistence API (JPA) criteria query feature. The JPA is a specification for management of (primarily, but not limited to) relational data. It comprises a set of Java interfaces, annotations, and the JPA query language (JPQL) and thus provides a framework for uniform persistence and retrieval of Ja...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010